我们提出了E3NN,这是一个通用框架,用于创建E(3)e术训练功能,也称为欧几里得神经网络。E3NN自然地在几何和几何张量上进行操作,这些几何和几何张量描述了3D中的系统,并在坐标系统的变化下可预测地转换。E3NN的核心是诸如张力生产类别或球形谐波函数之类的等效操作,这些功能可以组成,以创建更复杂的模块,例如卷积和注意机制。E3NN的这些核心操作可用于有效地阐明张量球场网络,3D可通道的CNN,Clebsch-Gordan Networks,SE(3)变压器和其他E(3)E(3)Equivariant网络。
translated by 谷歌翻译
在代理人的观察是部分或嘈杂的情况下,钢筋学习通常难以进行部分观察到的马尔可夫决策过程(POMDPS)。为了在POMDPS中寻求良好的表现,一个策略是通过一个有限的内存赋予代理,其更新由策略管理。但是,在这种情况下,策略优化是非凸的,可以导致随机初始化的训练性能不佳。通过约束内存架构,可以经验改善性能,然后牺牲最佳性以方便培训。在这里,我们在两个假设检测问题中研究了这一权衡,类似于双臂强盗问题。我们比较两个极端情况:(i)随机存取存储器,其中允许使用$ M $ Memory状态之间的任何转换,并且(ii)代理可以访问其最后一个$ M $操作和奖励的固定内存。对于(i),Q $扮演最糟糕的手臂Q $令人难以指数较小的价格为最佳政策。我们的主要结果是表明,尽管内存架构简化了^ m $,即$ q \ sim \ alpha ^ {2 ^ m} $ \ alpha <1 $。此外,我们遵守从随机初始化的训练导致(i)的结果非常差,并且由于内存架构上的约束,(ii)的结果显着更好。
translated by 谷歌翻译
理解为什么深网络可以在大尺寸中对数据进行分类仍然是一个挑战。已经提出了它们通过变得稳定的差异术,但现有的经验测量值得支持它通常不是这种情况。我们通过定义弥散术的最大熵分布来重新审视这个问题,这允许研究给定规范的典型的扩散术。我们确认对基准数据集的稳定性与基准数据集的性能没有强烈关联。相比之下,我们发现,对于普通转换的稳定性,R_F $的稳定性与测试错误$ \ epsilon_t $相比。在初始化时,它是初始化的统一,但在最先进的架构培训期间减少了几十年。对于CiFar10和15名已知的架构,我们发现$ \ epsilon_t \约0.2 \ sqrt {r_f} $,表明获得小$ r_f $非常重要,无法实现良好的性能。我们研究R_F $如何取决于培训集的大小,并将其与简单的不变学习模型进行比较。
translated by 谷歌翻译
这项工作介绍了神经性等因素的外部潜力(NEQUIP),E(3) - 用于学习分子动力学模拟的AB-INITIO计算的用于学习网状体电位的e(3)的神经网络方法。虽然大多数当代对称的模型使用不变的卷曲,但仅在标量上采取行动,Nequip采用E(3) - 几何张量的相互作用,举起Quivariant卷曲,导致了更多的信息丰富和忠实的原子环境代表。该方法在挑战和多样化的分子和材料集中实现了最先进的准确性,同时表现出显着的数据效率。 Nequip优先于现有型号,最多三个数量级的培训数据,挑战深度神经网络需要大量培训套装。该方法的高数据效率允许使用高阶量子化学水平的理论作为参考的精确潜力构建,并且在长时间尺度上实现高保真分子动力学模拟。
translated by 谷歌翻译
We address the task of open-world class-agnostic object detection, i.e., detecting every object in an image by learning from a limited number of base object classes. State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects. This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are also prone to overfitting short-cut cues such as textures and discriminative parts. To address these shortcomings of RGB-based object detectors, we propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators. Specifically, we use the geometric cues to train an object proposal network for pseudo-labeling unannotated novel objects in the training set. Our resulting Geometry-guided Open-world Object Detector (GOOD) significantly improves detection recall for novel object categories and already performs well with only a few training classes. Using a single "person" class for training on the COCO dataset, GOOD surpasses SOTA methods by 5.0% AR@100, a relative improvement of 24%.
translated by 谷歌翻译
In the era of digital healthcare, the huge volumes of textual information generated every day in hospitals constitute an essential but underused asset that could be exploited with task-specific, fine-tuned biomedical language representation models, improving patient care and management. For such specialized domains, previous research has shown that fine-tuning models stemming from broad-coverage checkpoints can largely benefit additional training rounds over large-scale in-domain resources. However, these resources are often unreachable for less-resourced languages like Italian, preventing local medical institutions to employ in-domain adaptation. In order to reduce this gap, our work investigates two accessible approaches to derive biomedical language models in languages other than English, taking Italian as a concrete use-case: one based on neural machine translation of English resources, favoring quantity over quality; the other based on a high-grade, narrow-scoped corpus natively written in Italian, thus preferring quality over quantity. Our study shows that data quantity is a harder constraint than data quality for biomedical adaptation, but the concatenation of high-quality data can improve model performance even when dealing with relatively size-limited corpora. The models published from our investigations have the potential to unlock important research opportunities for Italian hospitals and academia. Finally, the set of lessons learned from the study constitutes valuable insights towards a solution to build biomedical language models that are generalizable to other less-resourced languages and different domain settings.
translated by 谷歌翻译
Modeling perception sensors is key for simulation based testing of automated driving functions. Beyond weather conditions themselves, sensors are also subjected to object dependent environmental influences like tire spray caused by vehicles moving on wet pavement. In this work, a novel modeling approach for spray in lidar data is introduced. The model conforms to the Open Simulation Interface (OSI) standard and is based on the formation of detection clusters within a spray plume. The detections are rendered with a simple custom ray casting algorithm without the need of a fluid dynamics simulation or physics engine. The model is subsequently used to generate training data for object detection algorithms. It is shown that the model helps to improve detection in real-world spray scenarios significantly. Furthermore, a systematic real-world data set is recorded and published for analysis, model calibration and validation of spray effects in active perception sensors. Experiments are conducted on a test track by driving over artificially watered pavement with varying vehicle speeds, vehicle types and levels of pavement wetness. All models and data of this work are available open source.
translated by 谷歌翻译
In recent years, image and video delivery systems have begun integrating deep learning super-resolution (SR) approaches, leveraging their unprecedented visual enhancement capabilities while reducing reliance on networking conditions. Nevertheless, deploying these solutions on mobile devices still remains an active challenge as SR models are excessively demanding with respect to workload and memory footprint. Despite recent progress on on-device SR frameworks, existing systems either penalize visual quality, lead to excessive energy consumption or make inefficient use of the available resources. This work presents NAWQ-SR, a novel framework for the efficient on-device execution of SR models. Through a novel hybrid-precision quantization technique and a runtime neural image codec, NAWQ-SR exploits the multi-precision capabilities of modern mobile NPUs in order to minimize latency, while meeting user-specified quality constraints. Moreover, NAWQ-SR selectively adapts the arithmetic precision at run time to equip the SR DNN's layers with wider representational power, improving visual quality beyond what was previously possible on NPUs. Altogether, NAWQ-SR achieves an average speedup of 7.9x, 3x and 1.91x over the state-of-the-art on-device SR systems that use heterogeneous processors (MobiSR), CPU (SplitSR) and NPU (XLSR), respectively. Furthermore, NAWQ-SR delivers an average of 3.2x speedup and 0.39 dB higher PSNR over status-quo INT8 NPU designs, but most importantly mitigates the negative effects of quantization on visual quality, setting a new state-of-the-art in the attainable quality of NPU-based SR.
translated by 谷歌翻译
Evaluating and comparing text-to-image models is a challenging problem. Significant advances in the field have recently been made, piquing interest of various industrial sectors. As a consequence, a gold standard in the field should cover a variety of tasks and application contexts. In this paper a novel evaluation approach is experimented, on the basis of: (i) a curated data set, made by high-quality royalty-free image-text pairs, divided into ten categories; (ii) a quantitative metric, the CLIP-score, (iii) a human evaluation task to distinguish, for a given text, the real and the generated images. The proposed method has been applied to the most recent models, i.e., DALLE2, Latent Diffusion, Stable Diffusion, GLIDE and Craiyon. Early experimental results show that the accuracy of the human judgement is fully coherent with the CLIP-score. The dataset has been made available to the public.
translated by 谷歌翻译
In this paper we propose a general approach to define a many-valued preferential interpretation of gradual argumentation semantics. The approach allows for conditional reasoning over arguments and boolean combination of arguments, with respect to a class of gradual semantics, through the verification of graded (strict or defeasible) implications over a preferential interpretation. As a proof of concept, in the finitely-valued case, an Answer set Programming approach is proposed for conditional reasoning in a many-valued argumentation semantics of weighted argumentation graphs. The paper also develops and discusses a probabilistic semantics for gradual argumentation, which builds on the many-valued conditional semantics.
translated by 谷歌翻译